Third-Person Imitation Learning

نویسندگان

  • Bradly C. Stadie
  • Pieter Abbeel
  • Ilya Sutskever
چکیده

Reinforcement learning (RL) makes it possible to train agents capable of achieving sophisticated goals in complex and uncertain environments. A key difficulty in reinforcement learning is specifying a reward function for the agent to optimize. Traditionally, imitation learning in RL has been used to overcome this problem. Unfortunately, hitherto imitation learning methods tend to require that demonstrations are supplied in the first-person: the agent is provided with a sequence of states and a specification of the actions that it should have taken. While powerful, this kind of imitation learning is limited by the relatively hard problem of collecting first-person demonstrations. Humans address this problem by learning from third-person demonstrations: they observe other humans perform tasks, infer the task, and accomplish the same task themselves. In this paper, we present a method for unsupervised third-person imitation learning. Here third-person refers to training an agent to correctly achieve a simple goal in a simple environment when it is provided a demonstration of a teacher achieving the same goal but from a different viewpoint; and unsupervised refers to the fact that the agent receives only these third-person demonstrations, and is not provided a correspondence between teacher states and student states. Our methods primary insight is that recent advances from domain confusion can be utilized to yield domain agnostic features which are crucial during the training process. To validate our approach, we report successful experiments on learning from third-person demonstrations in a pointmass domain, a reacher domain, and inverted pendulum.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The effect of different imitation models on theaccuracy and speed of imitation of movement.

[Purpose] The purpose of this study was to compare the accuracy, speed and subjective ease of imitation of movement using three different imitation models. [Subjects] Thirty-four right-handed healthy males participated in this study. [Methods] The imitation task chosen for this study was an asymmetric combined motion of the upper and lower limbs. Three kinds of imitation models were displayed o...

متن کامل

Behavioral Advantages of the First-Person Perspective Model for Imitation

Visuomotor information may be better conveyed through a first-person perspective than through a third-person perspective. However, few reports have shown a clear behavioral advantage of the first-person perspective because of the confounding factor of spatial stimulus-response compatibility. Most imitation studies have utilized visuospatial imitation tasks in which participants use the same bod...

متن کامل

Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation

Imitation learning is an effective approach for autonomous systems to acquire control policies when an explicit reward function is unavailable, using supervision provided as demonstrations from an expert, typically a human operator. However, standard imitation learning methods assume that the agent receives examples of observation-action tuples that could be provided, for instance, to a supervi...

متن کامل

Impact of Recasts and Prompts on the Learning of English Third Person Singular Marker by Persian Learners of English

Based on the controversial beliefs among L2 teachers about effective corrective feedback (CF) strategies, recast and prompts as 2 kinds of CF have drawn the attention of L2 researchers(e.g., Braidi, 2002; Iwashita, 2003; Loewen & Philp, 2006; Panova & Lyster, 2002; Sheen, 2004). Despite these numbers of studies, debate continues to exist about their usefulness as a CF technique. Whereas recasts...

متن کامل

"Should I or shouldn't I?" Imitation of undesired versus allowed actions from peer and adult models by 18- and 24-month-old toddlers.

Imitation is a common way of acquiring novel behaviors in toddlers. However, little is known about toddlers' imitation of undesired actions. Here we investigated 18- and 24-month-olds' (N=110) imitation of undesired and allowed actions from televised peer and adult models. Permissiveness of the demonstrated actions was indicated by the experimenter's response to their execution (angry or neutra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1703.01703  شماره 

صفحات  -

تاریخ انتشار 2017